Wavelet-based relative prefix sum methods for range sum queries in data cubes

نویسنده

Daniel Lemire

چکیده

Data mining and related applications often rely on extensive range sum queries and thus, it is important for these queries to scale well. Range sum queries in data cubes can be achieved in time O(1) using prefix sum aggregates but prefix sum update costs are proportional to the size of the data cube O ( nd ) . Using the Relative Prefix Sum (RPS) method, the update costs can be reduced to the root of the size of the data cube O ( nd/2 ) . We present a new family of base b wavelet algorithms further reducing the update costs to O ( nd/β ) for β as large as we want while preserving constant-time queries. We also show that this approach leads to O ( logd n ) query and update methods twice as fast as Haarbased methods. Moreover, since these new methods are pyramidal, they provide incrementally improving estimates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wavelet - Based Relative Prefix Sum Methods for Range Sum Queries in Data

متن کامل

Relative Prefix Sums: An Efficient Approach for Querying Dynamic OLAP Data Cubes

Range sum queries on data cubes are a powerful tool for analysis. A range sum query applies an aggregation operation (e.g., SUM) over all selected cells in a data cube, where the selection is specified by providing ranges of values for numeric dimensions. Many application domains require that information provided by analysis tools be current or "near-current." Existing techniques for range sum ...

متن کامل

On the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries

In recent years wavelet based synopses were shown to be effective for approximate queries in database systems. The simplest wavelet synopses are constructed by computing the Haar transform over a vector consisting of either the raw-data or the prefix-sums of the data, and using a greedy-heuristic to select the wavelet coefficients that are kept in the synopsis. The greedy-heuristic is known to ...

متن کامل

Data Cubes in Dynamic Environments

The data cube, also known in the OLAP community as the multidimensional database, is designed to provide aggregate information that can be used to analyze the contents of databases and data warehouses. Previous research mainly focussed on strategies for supporting queries, assuming that updates do not play an important role and can be propagated to the data cube in batches. While this might be ...

متن کامل